Individual glacier analysis 1

This notebook will walk you through steps to read in and organize velocity data and clip it to the extent of a single glacier. The tools we will use include xarray, rioxarray and geopandas.

To clip its_live data to the extent of a single glacier we will use a vector dataset of glacier outlines, the Randolph Glacier Inventory. These aren’t cloud-hosted currently so you will need to download the data to your local machine.

Learning goals come back and finish these, feel like this notebook has alot, is pretty disorganized..
using xarray to read zarr data from s3 bucket

  • rio.clip() to clip raster by vector

  • viewing CRS, reprojecting and writing CRS data for various objects

  • dataset.where()

  • dataset.sel() using multiple conditions

  • groupby

First, lets install the python libraries that were listed on the Software page:

import geopandas as gpd
import os
import numpy as np
import xarray as xr
import rioxarray as rxr
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from shapely.geometry import Polygon
from shapely.geometry import Point
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy
import cartopy.feature as cfeature
import json
import urllib.request
from skimage.morphology import skeletonize
import pandas as pd
import seaborn as sns 
from matplotlib import pyplot as plt
%config InlineBackend.figure_format='retina'

Reading in ITS_LIVE data

We will use some of the functions we defiend in the data access notebook to read in data here. First, let’s read in the catalog again:

#import itslivetools
with urllib.request.urlopen('https://its-live-data.s3.amazonaws.com/datacubes/catalog_v02.json') as url_catalog:
    itslive_catalog = json.loads(url_catalog.read().decode())
itslive_catalog.keys()
dict_keys(['type', 'features'])

Take a look at a single catalog entry:

Use the function below to find the url that corresponds to the zarr datacube for a specific point:

def find_granule_by_point(input_dict, input_point): #[lon,lat]
    '''Takes an inputu dictionary (a geojson catalog) and a point to represent AOI.
    this returns a list of the s3 urls corresponding to zarr datacubes whose footprint covers the AOI'''
    #print([input_points][0])
    
    target_granule_urls = []
    #Point(coord[0], coord[1])
    #print(input_point[0])
    #print(input_point[1])
    point_geom = Point(input_point[0], input_point[1])
    #print(point_geom)
    point_gdf = gpd.GeoDataFrame(crs='epsg:4326', geometry = [point_geom])
    for granule in range(len(input_dict['features'])):
        
        #print('tick')
        bbox_ls = input_dict['features'][granule]['geometry']['coordinates'][0]
        bbox_geom = Polygon(bbox_ls)
        bbox_gdf = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry = [bbox_geom])
        
        #if poly_gdf.contains(points1_ls[poly]).all() == True:

        if bbox_gdf.contains(point_gdf).all() == True:
            #print('yes')
            target_granule_urls.append(input_dict['features'][granule]['properties']['zarr_url'])
        else:
            pass
            #print('no')
    return target_granule_urls

This function will read in a xarray dataset from a url to a zarr datacube when we’re ready:

I started with chunk_size='auto' but ran into issues. more about choosing good chunk sizes here.

def read_in_s3(http_url):
    s3_url = http_url.replace('http','s3')
    s3_url = s3_url.replace('.s3.amazonaws.com','')

    datacube = xr.open_dataset(s3_url, engine = 'zarr',
                                storage_options={'anon':True},
                                chunks = 'auto')

    return datacube
def get_bbox_single(input_xr):
    
    '''Takes input xr object (from itslive data cube), plots a quick map of the footprint. 
    currently only working for granules in crs epsg 32645'''

    xmin = input_xr.coords['x'].data.min()
    xmax = input_xr.coords['x'].data.max()

    ymin = input_xr.coords['y'].data.min()
    ymax = input_xr.coords['y'].data.max()

    pts_ls = [(xmin, ymin), (xmax, ymin),(xmax, ymax), (xmin, ymax), (xmin, ymin)]

    #print(input_xr.mapping.spatial_epsg)
    #print(f"epsg:{input_xr.mapping.spatial_epsg}")
    crs = f"epsg:{input_xr.mapping.spatial_epsg}"
    #crs = {'init':f'epsg:{input_xr.mapping.spatial_epsg}'}
    #crs = 'epsg:32645'
    #print(crs)

    polygon_geom = Polygon(pts_ls)
    polygon = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom]) 
    #polygon = polygon.to_crs('epsg:4326')

    bounds = polygon.total_bounds

    return polygon
url = find_granule_by_point(itslive_catalog, [84.56, 28.54])
url
['http://its-live-data.s3.amazonaws.com/datacubes/v02/N20E080/ITS_LIVE_vel_EPSG32645_G0120_X250000_Y3150000.zarr']
dc = read_in_s3(url[0])
dc
<xarray.Dataset>
Dimensions:                    (mid_date: 20549, y: 833, x: 833)
Coordinates:
  * mid_date                   (mid_date) datetime64[ns] 2020-02-07T17:10:52....
  * x                          (x) float64 2.001e+05 2.002e+05 ... 2.999e+05
  * y                          (y) float64 3.2e+06 3.2e+06 ... 3.1e+06 3.1e+06
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(20549,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(20549, 40, 40), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(20549, 40, 40), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    ...                         ...
    vy_error_mask              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...

We are reading this in as a dask array. Let’s take a look at the chunk sizes:

NOTE: chunksizes shows the largest chunk size. chunks shows the sizes of all chunks along all dims, better if you have irregular chunks

dc.chunksizes
Frozen({'mid_date': (20549,), 'y': (40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 33), 'x': (40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 33)})
dc.chunks
Frozen({'mid_date': (20549,), 'y': (40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 33), 'x': (40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 33)})

I think it could be useful to talk about dask chunk sizes here? Especially since I run into a warning a few steps down. Need to look into rechunking more to better undrestand first – fix warning below but still not sure about specifying chunk sizes

Check CRS of xr object:

dc.mapping
<xarray.DataArray 'mapping' ()>
array('', dtype='<U1')
Attributes:
    CoordinateAxisTypes:      GeoX GeoY
    CoordinateTransformType:  Projection
    GeoTransform:             200032.5 120.0 0 3199927.5 0 -120.0
    grid_mapping_name:        universal_transverse_mercator
    inverse_flattening:       298.257223563
    semi_major_axis:          6378137.0
    spatial_epsg:             32645
    spatial_proj4:            +proj=utm +zone=45 +datum=WGS84 +units=m +no_defs
    spatial_ref:              PROJCS["WGS 84 / UTM zone 45N",GEOGCS["WGS 84",...
    utm_zone_number:          45.0

Let’s take a look at the time dimension (mid_date here). To start with we’ll just print the first 10 values:

for element in range(10):
    
    print(dc.mid_date[element].data)
2020-02-07T17:10:52.528083968
2021-11-11T05:10:40.529083904
2015-03-18T00:19:03.524224768
2019-10-10T17:13:32.528083968
2016-07-15T05:06:36.529083904
2021-08-10T17:10:34.529083904
2021-04-15T00:19:35.894403072
2018-06-27T17:09:39.528083968
2019-12-22T05:10:48.528083968
2020-07-29T05:10:51.528083968

Weird, it doesn’t look like the time dimension is in chronological order, let’s fix that:

dc_timesorted = dc.sortby(dc['mid_date'])
dc_timesorted
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
/Users/emarshall/miniconda3/envs/mynewbook/lib/python3.10/site-packages/xarray/core/indexing.py:1228: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  return self.array[key]
<xarray.Dataset>
Dimensions:                    (mid_date: 20549, y: 833, x: 833)
Coordinates:
  * mid_date                   (mid_date) datetime64[ns] 2013-04-09T16:49:00....
  * x                          (x) float64 2.001e+05 2.002e+05 ... 2.999e+05
  * y                          (y) float64 3.2e+06 3.2e+06 ... 3.1e+06 3.1e+06
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(20549,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(20549, 40, 40), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(20549, 40, 40), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    ...                         ...
    vy_error_mask              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...

When we read in the zarr datacube as a xr.Dataset we set the chunk sizes to auto. When we try to sort along the mid_date dimension this seems to become a problem and we get the warning above.

Let’s follow the instructions in the warning message to avoid creating large chunks:

import dask
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    dc_timesorted = dc.sortby(dc['mid_date'])
    dc_timesorted
dc_timesorted
<xarray.Dataset>
Dimensions:                    (mid_date: 20549, y: 833, x: 833)
Coordinates:
  * mid_date                   (mid_date) datetime64[ns] 2013-04-09T16:49:00....
  * x                          (x) float64 2.001e+05 2.002e+05 ... 2.999e+05
  * y                          (y) float64 3.2e+06 3.2e+06 ... 3.1e+06 3.1e+06
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(20549,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(49, 40, 40), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(49, 40, 40), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    ...                         ...
    vy_error_mask              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...
for element in range(10):
    
    print(dc_timesorted.mid_date[element].data)
2013-04-09T16:49:00.528084992
2013-04-23T04:55:55.528083968
2013-04-25T16:49:09.528084992
2013-05-01T04:56:02.028083968
2013-05-03T16:49:08.028084992
2013-05-09T04:56:00.528083968
2013-05-09T04:56:03.528083968
2013-05-10T04:49:51.028084992
2013-05-17T04:56:02.028083968
2013-05-18T04:49:49.528084992

Read in vector data

We are going to read in RGI region 15 (SouthAsiaEast). RGI data is downloaded in lat/lon coordinates. We will project it to match the CRS of the ITS_LIVE dataset and then select an individual glacier to begin our analysis.

se_asia = gpd.read_file('/Users/emarshall/Desktop/siparcs/data/nsidc0770_15.rgi60.SouthAsiaEast/15_rgi60_SouthAsiaEast.shp')
se_asia.head(3)
RGIId GLIMSId BgnDate EndDate CenLon CenLat O1Region O2Region Area Zmin ... Aspect Lmax Status Connect Form TermType Surging Linkages Name geometry
0 RGI60-15.00001 G102044E29941N 19990920 -9999999 102.044042 29.941000 15 3 0.438 4996 ... 251 850 0 0 0 0 9 9 None POLYGON ((102.03759 29.93828, 102.03759 29.938...
1 RGI60-15.00002 G102042E29987N 19990920 -9999999 102.042346 29.987019 15 3 0.644 4947 ... 244 1021 0 0 0 0 9 9 None POLYGON ((102.04195 29.99030, 102.04197 29.990...
2 RGI60-15.00003 G102041E29997N 19990920 -9999999 102.041130 29.997311 15 3 0.225 5019 ... 274 812 0 0 0 0 9 9 None POLYGON ((102.03710 29.99774, 102.03719 29.998...

3 rows × 23 columns

#project rgi data to match itslive
se_asia_prj = se_asia.to_crs('EPSG:32645') #we know the epsg from looking at the 'spatial epsg' attr of the mapping var of the dc object
se_asia_prj.head(3)
RGIId GLIMSId BgnDate EndDate CenLon CenLat O1Region O2Region Area Zmin ... Aspect Lmax Status Connect Form TermType Surging Linkages Name geometry
0 RGI60-15.00001 G102044E29941N 19990920 -9999999 102.044042 29.941000 15 3 0.438 4996 ... 251 850 0 0 0 0 9 9 None POLYGON ((1959630.570 3408951.748, 1959630.394...
1 RGI60-15.00002 G102042E29987N 19990920 -9999999 102.042346 29.987019 15 3 0.644 4947 ... 244 1021 0 0 0 0 9 9 None POLYGON ((1959271.126 3414873.173, 1959273.308...
2 RGI60-15.00003 G102041E29997N 19990920 -9999999 102.041130 29.997311 15 3 0.225 5019 ... 274 812 0 0 0 0 9 9 None POLYGON ((1958682.136 3415647.929, 1958684.710...

3 rows × 23 columns

Crop RGI to ITS_LIVE extent

  • use get_bbox_single() from access nb but no plotting (above)

#first, get vector bbox of itslive

bbox_dc = get_bbox_single(dc)
bbox_dc['geometry']

#subset rgi to bounds 
se_asia_subset = gpd.clip(se_asia_prj, bbox_dc)
se_asia_subset
se_asia_subset.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
sample_glacier_vec = se_asia_subset.loc[se_asia_subset['RGIId'] == 'RGI60-15.04714']
sample_glacier_vec
RGIId GLIMSId BgnDate EndDate CenLon CenLat O1Region O2Region Area Zmin ... Aspect Lmax Status Connect Form TermType Surging Linkages Name geometry
4713 RGI60-15.04714 G084393E28743N 20010929 -9999999 84.392609 28.743019 15 1 13.593 4481 ... 317 12292 0 0 0 0 9 9 None POLYGON ((242056.441 3181919.463, 242008.653 3...

1 rows × 23 columns

Clip ITS_LIVE dataset to individual glacier extent

First, we need to use rio.write_crs() to assign a CRS to the itslive object. If we don’t do that first the rio.clip() command will produce an error Note: it looks like you can only run write_crs() once, because it switches mapping from being a data_var to a coord so if you run it again it will produce a key error looking for a var that doesnt’ exist

dc_timesorted = dc_timesorted.rio.write_crs(f"epsg:{dc_timesorted.mapping.attrs['spatial_epsg']}", inplace=True)
%%time

sample_glacier_raster = dc_timesorted.rio.clip(sample_glacier_vec.geometry, sample_glacier_vec.crs)
CPU times: user 274 ms, sys: 7.72 ms, total: 281 ms
Wall time: 280 ms
sample_glacier_raster
<xarray.Dataset>
Dimensions:                    (mid_date: 20549, y: 54, x: 100)
Coordinates:
    mapping                    int64 0
  * mid_date                   (mid_date) datetime64[ns] 2013-04-09T16:49:00....
  * y                          (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                          (x) float64 2.36e+05 2.361e+05 ... 2.479e+05
Data variables: (12/53)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(20549,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(49, 20, 21), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(49, 20, 21), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    ...                         ...
    vy_error_mask              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...

Let’s take a look at the clipped raster alongside the vector outline. To start with and for the sake of easy visualizing we will take the mean of the magnitude of velocity variable along the mid_date dimension:

fig, ax = plt.subplots(figsize = (15,9))
sample_glacier_vec.plot(ax=ax, facecolor='none', edgecolor='red');
sample_glacier_raster.v.mean(dim=['mid_date']).plot(ax=ax);
_images/ind_glacier_analysis1_39_0.png

Now let’s take a look at the x and y components of velocity, again averaging over time:

fig, axs = plt.subplots(ncols =2, figsize=(17,7))

sample_glacier_raster.vx.mean(dim='mid_date').plot(ax=axs[0])
sample_glacier_raster.vy.mean(dim='mid_date').plot(ax=axs[1])
<matplotlib.collections.QuadMesh at 0x19702b3a0>
_images/ind_glacier_analysis1_41_1.png
sample_glacier_raster.v_error.mean(dim=['mid_date']).plot()
<matplotlib.collections.QuadMesh at 0x1b9ccb9d0>
_images/ind_glacier_analysis1_42_1.png

Exploring ITS_LIVE data

ITS_LIVE data cubes come with many (53!) variables that carry information about the estimated surface velocities and the satellite images that were used to generate the surface velocity estimates. We won’t examine all of this information here but let’s look at a litte bit.

To start with, let’s look at the satellite imagery used to generate the velocity data.

We see that we have two data_vars that indicate which sensor that each image in the image pair at a certain time step comes from:

sample_glacier_raster.satellite_img1.data.compute()
array(['8.', '8.', '8.', ..., '2A', '2B', '2B'], dtype='<U2')
sample_glacier_raster.satellite_img2
<xarray.DataArray 'satellite_img2' (mid_date: 20549)>
dask.array<copy, shape=(20549,), dtype=<U2, chunksize=(20549,), chunktype=numpy.ndarray>
Coordinates:
    mapping   int64 0
  * mid_date  (mid_date) datetime64[ns] 2013-04-09T16:49:00.528084992 ... 202...
Attributes:
    description:    id of the satellite that acquired image 2
    standard_name:  image2_satellite

The satellite_img1 and satellite_img2 variables are 1-dimensional numpy arrays corresponding to the length of the mid_date dimension of the data cube. You can see that each element of the array is a string corresponding to a different satellite: 1A = Sentinel 1A, 1B = Sentinel 1B, 2A = Sentinel 2A 2B = Sentinel 2B, 8. = Landsat8 and 9. = Landsat9

Let’s re-arrange these string arrays into a format that is easier to work with.

First, we’ll make a set of all the different string values in the satellite image variables:

sat_ls1 = list(set(sample_glacier_raster.satellite_img1.compute().data)) #these should be the same, and img1 img2 should only 
sat_ls2 = set(sample_glacier_raster.satellite_img2.compute().data) #differ if its someting like 2a, 2b or 1a, 1b so i think shouldn't 
                                                                    #have to worry about the 2 vars ?

Next, we’ll assign a value to each element in the set:

mapping = {}

for x in range(len(sat_ls1)):
    mapping[sat_ls1[x]] = x
print('mapping: ', mapping)
print('')
mapping:  {'8.': 0, '9.': 1, '2B': 2, '2A': 3, '1B': 4, '1A': 5}

We’ll then convert each element of the satellite image variable arrays to a binary array that gives us the integer associated with each sensor:

#convert each satellite_img1 value to binary array indicated int associated with sensor
one_hot_encode = []
for c in sample_glacier_raster.satellite_img1.compute().data:
    arr = list(np.zeros(len(sat_ls1), dtype=int))
    arr[mapping[c]]= 1
    one_hot_encode.append(arr)

Back out the sensor integer from the binary array:

sensor_ints = [int(one_hot_encode[x].index(1)) for x in range(len(one_hot_encode))]

Then make a pandas dataframe with each mid_date of the data cube and the sensor integer

dates_ls = list(sample_glacier_raster.mid_date.data)
#make dataframe of sensor ints and associated img date
sat_df = pd.DataFrame({'mid_date1':dates_ls, 'sensor': sensor_ints})
sat_df['mid_date'] = sat_df['mid_date1'].dt.date
sat_df = sat_df.drop('mid_date1', axis=1)
sat_df = sat_df.sort_values(by='mid_date')
sat_df = sat_df.set_index('mid_date')

As a first step, let’s visualize the time series of different sensors as a heat map

#make heatmap
pal = sns.color_palette('Paired',6)

fig, ax = plt.subplots(figsize=(20,4))
sns.heatmap(sat_df.T, cmap=pal, ax=ax);
_images/ind_glacier_analysis1_57_0.png

We can wrap those steps into a function to use them more easily:

#help from: https://www.educative.io/answers/one-hot-encoding-in-python
#help from: https://datascienceparichay.com/article/remove-time-from-date-pandas/

def get_satellite_as_int(input_da):
    
    ''' Function that takes a dast xr.DataArray that represents what sensor velocity data from a specific date was collected from.
    returns an xr.DataArray of the sensor coded as an integer key as well as a pandas df with mid_date as index, sensor integer as 
    a column. **still need to figure out how to carry the mapping of what sensor str corresponds to what sensor integer through**'''
    
    #make list of satellite strs
    sat_ls = list(set(input_da.compute().data))
    #sat_ls = list(set(input_da.data))
    #map strs to ints
    mapping = {}
    for x in range(len(sat_ls)):
        mapping[sat_ls[x]] = x
    print('mapping: ', mapping)
    #convert each satellite_img1 value to binary array indicated int associated with sensor
    one_hot_encode = []
    for c in input_da.compute().data:
        arr = list(np.zeros(len(sat_ls), dtype=int))
        arr[mapping[c]]= 1
        one_hot_encode.append(arr)
    sensor_ints = [one_hot_encode[x].index(1) for x in range(len(one_hot_encode))]

    dates_ls = list(input_da.mid_date.data)
    #make dataframe of sensor ints and associated img date
    sat_df = pd.DataFrame({'mid_date1':dates_ls, 'sensor': sensor_ints})
    #sat_df['mid_date'] = sat_df['mid_date1'].dt.date
    #sat_df = sat_df.drop('mid_date1', axis=1)
    sat_df = sat_df.sort_values(by='mid_date1')
    sat_df = sat_df.set_index('mid_date1')
    
    sat_xr = sat_df['sensor'].to_xarray()
    
    return sat_xr, sat_df
a = get_satellite_as_int(sample_glacier_raster.satellite_img1.compute())[0]
mapping:  {'8.': 0, '9.': 1, '2B': 2, '2A': 3, '1B': 4, '1A': 5}

And if we want we can add this new xarray.DataArray back as a data_var in the original xarray.Dataset:

sample_glacier_raster['satellite_img_int'] = ('mid_date', a.data)
sample_glacier_raster
<xarray.Dataset>
Dimensions:                    (mid_date: 20549, y: 54, x: 100)
Coordinates:
    mapping                    int64 0
  * mid_date                   (mid_date) datetime64[ns] 2013-04-09T16:49:00....
  * y                          (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                          (x) float64 2.36e+05 2.361e+05 ... 2.479e+05
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(20549,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(49, 20, 21), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(49, 20, 21), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(20549,), meta=np.ndarray>
    ...                         ...
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(20549,), meta=np.ndarray>
    satellite_img_int          (mid_date) int64 0 0 0 0 0 0 0 ... 2 3 2 3 3 2 2
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...

What if we only wanted to look at the velocity estimates from landat8?

l8_data = sample_glacier_raster.where(sample_glacier_raster['satellite_img_int'] == 5., drop=True)
l8_data
<xarray.Dataset>
Dimensions:                    (mid_date: 448, y: 54, x: 100)
Coordinates:
    mapping                    int64 0
  * mid_date                   (mid_date) datetime64[ns] 2014-10-13T00:19:06....
  * y                          (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                          (x) float64 2.36e+05 2.361e+05 ... 2.479e+05
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(448,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(448,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) object dask.array<chunksize=(448,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(2, 20, 21), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(2, 20, 21), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(448,), meta=np.ndarray>
    ...                         ...
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    satellite_img_int          (mid_date) float64 5.0 5.0 5.0 ... 5.0 5.0 5.0
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...

dataset.where() at first seems appropriate to use for kind of operation but there’s actually a much easier way (Thank you Deepak!). Because we are selecting along a single dimension (mid_date), we can use xarray’s .sel() method instead. This is more efficient and integrates with dask arrays more smoothly.

l8_condition = sample_glacier_raster.satellite_img_int.isin(5.)
l8_subset = sample_glacier_raster.sel(mid_date=l8_condition)
l8_subset
<xarray.Dataset>
Dimensions:                    (mid_date: 448, y: 54, x: 100)
Coordinates:
    mapping                    int64 0
  * mid_date                   (mid_date) datetime64[ns] 2014-10-13T00:19:06....
  * y                          (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                          (x) float64 2.36e+05 2.361e+05 ... 2.479e+05
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(448,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(448,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(448,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(2, 20, 21), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(2, 20, 21), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(448,), meta=np.ndarray>
    ...                         ...
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(448,), meta=np.ndarray>
    satellite_img_int          (mid_date) int64 5 5 5 5 5 5 5 ... 5 5 5 5 5 5 5
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...

We can see that we are looking at roughly a third of the original time steps. Let’s take a look at the average speeds of the Landsat8-derived velocities:

l8_subset.v.mean(dim='mid_date').plot()
<matplotlib.collections.QuadMesh at 0x1b8e98eb0>
_images/ind_glacier_analysis1_68_1.png

What about Landsat9?

l9_condition = sample_glacier_raster.satellite_img_int.isin(1.)

l9_subset = sample_glacier_raster.sel(mid_date=l9_condition)
l9_subset
<xarray.Dataset>
Dimensions:                    (mid_date: 46, y: 54, x: 100)
Coordinates:
    mapping                    int64 0
  * mid_date                   (mid_date) datetime64[ns] 2021-11-06T16:48:19....
  * y                          (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                          (x) float64 2.36e+05 2.361e+05 ... 2.479e+05
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(46,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(46,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(46,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(2, 20, 21), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(2, 20, 21), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(46,), meta=np.ndarray>
    ...                         ...
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(46,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(46,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(46,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(46,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(46,), meta=np.ndarray>
    satellite_img_int          (mid_date) int64 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...

Only 45 time steps have data from Landsat9, this makes sense because Landsat9 was just launched recently

l9_subset.v.mean(dim='mid_date').plot()
<matplotlib.collections.QuadMesh at 0x1baf10910>
_images/ind_glacier_analysis1_72_1.png

Let’s look at Sentinel 1 data. Note here we are selecting for 2 values instead of 1:

s1_condition = sample_glacier_raster.satellite_img_int.isin([0,2])
s1_subset = sample_glacier_raster.sel(mid_date = s1_condition)
s1_subset
<xarray.Dataset>
Dimensions:                    (mid_date: 13934, y: 54, x: 100)
Coordinates:
    mapping                    int64 0
  * mid_date                   (mid_date) datetime64[ns] 2013-04-09T16:49:00....
  * y                          (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                          (x) float64 2.36e+05 2.361e+05 ... 2.479e+05
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(13934,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(13934,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(13934,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(49, 20, 21), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(49, 20, 21), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(13934,), meta=np.ndarray>
    ...                         ...
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(13934,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(13934,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(13934,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(13934,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(13934,), meta=np.ndarray>
    satellite_img_int          (mid_date) int64 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...
s1_subset.v.mean(dim='mid_date').plot()
<matplotlib.collections.QuadMesh at 0x1b9449600>
_images/ind_glacier_analysis1_75_1.png
s2_condition = sample_glacier_raster.satellite_img_int.isin([3,4])
s2_subset = sample_glacier_raster.sel(mid_date=s2_condition)
s2_subset
<xarray.Dataset>
Dimensions:                    (mid_date: 6121, y: 54, x: 100)
Coordinates:
    mapping                    int64 0
  * mid_date                   (mid_date) datetime64[ns] 2016-02-26T05:07:11....
  * y                          (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                          (x) float64 2.36e+05 2.361e+05 ... 2.479e+05
Data variables: (12/54)
    acquisition_date_img1      (mid_date) datetime64[ns] dask.array<chunksize=(6121,), meta=np.ndarray>
    acquisition_date_img2      (mid_date) datetime64[ns] dask.array<chunksize=(6121,), meta=np.ndarray>
    autoRIFT_software_version  (mid_date) <U5 dask.array<chunksize=(6121,), meta=np.ndarray>
    chip_size_height           (mid_date, y, x) float32 dask.array<chunksize=(1, 20, 21), meta=np.ndarray>
    chip_size_width            (mid_date, y, x) float32 dask.array<chunksize=(1, 20, 21), meta=np.ndarray>
    date_center                (mid_date) datetime64[ns] dask.array<chunksize=(6121,), meta=np.ndarray>
    ...                         ...
    vy_error_modeled           (mid_date) float64 dask.array<chunksize=(6121,), meta=np.ndarray>
    vy_error_slow              (mid_date) float64 dask.array<chunksize=(6121,), meta=np.ndarray>
    vy_stable_shift            (mid_date) float64 dask.array<chunksize=(6121,), meta=np.ndarray>
    vy_stable_shift_mask       (mid_date) float64 dask.array<chunksize=(6121,), meta=np.ndarray>
    vy_stable_shift_slow       (mid_date) float64 dask.array<chunksize=(6121,), meta=np.ndarray>
    satellite_img_int          (mid_date) int64 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 3
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...
s2_subset.v.mean(dim='mid_date').plot()
<matplotlib.collections.QuadMesh at 0x1bbc322c0>
_images/ind_glacier_analysis1_77_1.png
#attempt at ufunc, didn't work

#def xr_satellite_coding(a):
#    return xr.apply_ufunc(get_satellite_as_int, a)
#sample_glacier_raster
#test_ds = xr_satellite_coding(sample_glacier_raster.satellite_img1)
pal = sns.color_palette('Paired',6)
pal

Seasonal mean velocities with groupby

#first define the function we'll apply to each group
def middate_mean(a):
    return a.mean(dim='mid_date')
seasons_gb = sample_glacier_raster.groupby(sample_glacier_raster.mid_date.dt.season).map(middate_mean)
#add attrs to gb object
seasons_gb.attrs = sample_glacier_raster.attrs #why didn't that work?
seasons_gb
<xarray.Dataset>
Dimensions:               (y: 54, x: 100, season: 4)
Coordinates:
    mapping               int64 0
  * y                     (y) float64 3.188e+06 3.188e+06 ... 3.182e+06
  * x                     (x) float64 2.36e+05 2.361e+05 ... 2.477e+05 2.479e+05
  * season                (season) object 'DJF' 'JJA' 'MAM' 'SON'
Data variables: (12/43)
    chip_size_height      (season, y, x) float32 dask.array<chunksize=(1, 20, 21), meta=np.ndarray>
    chip_size_width       (season, y, x) float32 dask.array<chunksize=(1, 20, 21), meta=np.ndarray>
    date_dt               (season) timedelta64[ns] dask.array<chunksize=(1,), meta=np.ndarray>
    interp_mask           (season, y, x) float32 dask.array<chunksize=(1, 20, 21), meta=np.ndarray>
    roi_valid_percentage  (season) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    stable_count_mask     (season) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    ...                    ...
    vy_error_modeled      (season) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    vy_error_slow         (season) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    vy_stable_shift       (season) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    vy_stable_shift_mask  (season) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    vy_stable_shift_slow  (season) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    satellite_img_int     (season) float64 1.728 1.641 1.584 1.663
Attributes: (12/18)
    GDAL_AREA_OR_POINT:         Area
    author:                     ITS_LIVE, a NASA MEaSUREs project (its-live.j...
    autoRIFT_parameter_file:    http://its-live-data.s3.amazonaws.com/autorif...
    datacube_software_version:  1.0
    date_created:               09-Jun-2022 04:14:40
    date_updated:               09-Jun-2022 04:14:40
    ...                         ...
    s3:                         s3://its-live-data/datacubes/v02/N20E080/ITS_...
    skipped_granules:           s3://its-live-data/datacubes/v02/N20E080/ITS_...
    time_standard_img1:         UTC
    time_standard_img2:         UTC
    title:                      ITS_LIVE datacube of image_pair velocities
    url:                        https://its-live-data.s3.amazonaws.com/datacu...
fg = seasons_gb.v.plot(
    col='season',
    vmax = 150
)
_images/ind_glacier_analysis1_89_0.png